Improving Readability for Automatic Speech Recognition Transcription

نویسندگان

چکیده

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still be challenging to read due grammatical errors, disfluency, and other noises common spoken communication. These readable issues introduced by speakers ASR will impair the downstream tasks understanding human readers. In this work, we present task called post-processing for readability (APR) formulate it as sequence-to-sequence text generation problem. The APR aims transform noisy output into humans while maintaining semantic meaning speakers. We further study from benchmark dataset, evaluation metrics, baseline models: First, address lack task-specific data, propose method construct dataset using data collected error correction. Second, utilize metrics adapted or borrowed similar evaluate model on task. Lastly, use several typical pre-trained models Furthermore, fine-tune constructed compare their with traditional pipeline proposed metrics. Experimental results show that all fine-tuned perform better than method, our RoBERTa outperforms 4.95 6.63 BLEU points two test sets, respectively. case reveal ability improve transcripts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Automatic Speech Transcription for Multimedia Content

Automatic Speech Recognition systems are increasingly being used in multimedia retrieval applications, where speech recognition is used to aid the creation of high-quality transcripts for data such as multimedia meeting recordings, lectures and presentations, video and audio libraries, and broadcast news. These transcripts are then used for indexing and retrieval of the multimedia content over ...

متن کامل

Improving the Readability of Class Lecture Automatic Speech Recognition Results Using Multiple Hypotheses

This paper presents a method for improving the readability of class lecture Automatic Speech Recognition (ASR) results, which hitherto have been difficult for humans to understand, even in the absence of recognition errors. This is because the speech in a class lecture is relatively casual and contains many ill-formed utterances with filled pauses, restarts, and so on. Recently there has been e...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Improving automatic emotion recognition from speech signals

We present a speech signal driven emotion recognition system. Our system is trained and tested with the INTERSPEECH 2009 Emotion Challenge corpus, which includes spontaneous and emotionally rich recordings. The challenge includes classifier and feature sub-challenges with five-class and two-class classification problems. We investigate prosody related, spectral and HMM-based features for the ev...

متن کامل

Improving automatic speech recognition using tangent distance

In this paper we present a new approach to variance modelling in automatic speech recognition (ASR) that is based on tangent distance (TD). Using TD, classifiers can be made invariant w.r.t. small transformations of the data. Such transformations generate a manifold in a high dimensional feature space when applied to an observation vector. While conventional classifiers determine the distance b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2023

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3557894